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ABSTRACT 

A method is developed for distributing the compu- 
tation of graphics primitives on a parallel process- 
ing network. Off-the-shelf transputer boards are used 
to perform the graphics transformations and scan- 
coversion tasks that would normally be assigned to a 
single transputer based display processor. Each node 
in the network performs a single graphics primitive 
computation. Frequently requested tasks can be dupli- 
cated on several nodes. 

The results indicate that the current distribu- 
tion of commands on the graphics network shows a per- 
formance degradation when compared to the graphics 
display board alone. A change to more computation 
per node for every communication (perform more com- 
plex tasks on each node) may cause the desired 
increase in throughput. 

INTRODUCTION 

In an effort to increase the graphics rendering 
speed on a transputer based display board, a method 
has been developed that off loads the scan-conversion 
tasks to a network of other processors and frees the 
display board for performing only display tasks. 

NETWORK ARCHITECTURE 

The network architecture of the graphics computa- 
tion network is shown in Figure 1. The input control 
master node reads in the drawing command from an 
application program. Since the configuration of the 
network and the primitives present on each node of 
the network are known, a decision can be made on the 
correct routing path for the command. For every com- 
mand received by the input master node, a copy is 
sent to the output master node. These commands are 
queued up in a first-in-first-out (FIFO) buffer on 
the output master control node for later processing. 

For the commands that do not get processed on 
the graphics network such as display board hardware 
commands, the command is sent directly to the output 
control master FIFO buffer and no command is sent to 
the graphics network. 

NETWORK COMMUNICATION 

The graphics commands are distributed through 
the network by using smart buffer processes that run 
concurrently (multi-tasked) on each processor in the 
network. The configuration of the buffers contained 
on each processor in the network is shown in Figure 2. 

Commands sent anywhere on the network are pre- 
fixed with a command byte that specifies the type of 
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data to expect next. There are four tags that each 
input buffer uses to make network routing decisions. 
They are as fol lows : 

tag. global 
tag . work 
tag . dump 
tag. result 

The input buffer process controls commands sent 
from both the input and output master control nodes. 
The input buffer reads a command from any one of 
three input links. These are multiplexed inputs that 
use the occam ALT construct (Pountain 1987). 

Because the transputer graphics network used in 
this example is a multiple instruction multiple data 
s t ream (MIMD) parallel processor with no shared mem- 
ory, a method was devised to distribute the graphics 
data that is normally globally scoped to the entire 
network of processors. This was required to keep 
track of which window or screen coordinate system was 
active so the scan-conversion tasks would be per- 
formed properly. 

When the input buffer process gets a tag. global 
command, the command is sent to both the staging 
buffers on the current processor and the adjacent pro- 
cessor. Routing decisions are made so that each pro- 
cessor only gets one copy of the tag. global command 
and its respective data packet. The data flow for the 
tag. global command is shown in Figure 3. 

If the tag is tag. work, the next packet of data 
is read from the input channel to determine the desti- 
nation of the work packet. The destination is deter- 
mined only by the processor number, see Figure 1. 
Routing decisions are made by comparing the current 
node number with the destination node. If the work 
packet is for the current node, the data is sent to 
the first of several staging buffers. If the work 
packet is for another node, the data is sent to an 
adjacent processor based on the routing algorithm. 

The return buffer is also used when routing the 
tag. work commands, but only is used by processors 0 
and 8. This allows data to be sent to the processing 
node below for the tag. work or tag. global commands. 

See Figure 3 for the network data flow for the 
tag. global command. 

The staging buffer processes are used to buffer 
pending graphics command requests as well as to keep 
the network from deadlocking. It is the input con- 
trol master nodes responsibility to keep track of how 
many of each command are pending on the network. The 
number of commands for each node cannot exceed the 
number of staging buffers in the network or the net- 
work will deadlock. When the output master control 
buffer receives data off of the network (this is dis- 
cussed below), a command is sent to the input master 
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control node to decrement the command counter for the 
appropriate graphics command. 

The staging buffers act as FIFO queues and the 
requested graphics commands step through the staging 
buffers outputting to the graphics computation proc- 
ess. It is the graphics computation process that 
performs the scan-conversion task for the requested 
command. The computed data is packed into an array. 
The process then waits until a tag. dump command is 
received before dumping the computed data to the 
return buffer process. 

If the tag is tag. dump, the input buffer sends a 
message directly to the graphics computation process 
and bypasses the staging buffers completely. When 
the graphics computation process receives the tag. dump 
command, it sends its computed data to the return 
buffer process where it is routed to the output mas- 
ter node. Figure 4 shows the routing from the output 
master control node for the tag. dump command. 

Once the computed data are received by the return 
buffer process, the tag, tag. result, is attached to 
the beginning of the data packet to identify it as 
computed data. Based on the current processor number, 
routing decisions are made to send the data to the 
output master control node. The data flow to the out- 
put master control node for tag. result is shown in 
Figure 5. 

SYNCRONIZATION 

On a network of processors such as the one used 
for the graphics engine, it is not possible to deter- 
mine the exact order of completion of any of the com- 
putations distributed on the network. Initially, it 
may not seem important to maintain the correct 
sequence of drawing commands that get sent to the dis- 
play processor; however, when dealing with multiple 
windows, multiple drawing colors, and screen double- 
buffering, the order the display board receives the 
commands is critical. For example, if the applica- 
tion sends the following sequence of commands: 


PERFORMANCE 

The transputer serial links used on the graphics 
network are set to transfer data at 10 Mbits/sec 
(INMOS 1986). To check to see if the devised network 
did indeed increase the graphics throughput rate, sev- 
eral test cases were run to determine the scan- 
conversion, data transfer, and data display times. 

The tests were performed on a subset of the full net- 
work. Two processors, a compute node and the display 
board were used and all timings were taken using the 
transputer's high-priority microsecond resolution 
ti mer . 

The results for two test cases are shown below 
in Tables 1 and 2. The distributed results are com- 
pared to the computation/display times for the dis- 
play board alone. All computations were performed in 
integer device coordinates. 

The first case tested was a line scan-conversion 
using Bresenham's integer line algorithm (Foley and 
Van Dam 1982). Times for scan-conversion, data trans- 
fer, and display are given in Table 1. 

The second case tested was a circle scan- 
conversion using Bresenham's integer circle algorithm. 
These results are shown in Table 2. 

Unfortunately, the results presented above do 
not shown any advantage to performing the distributed 
computation of graphics primitives. With the current 
data transfer rates, the data transfer time dominates 
over the computational time. 

A more advantageous computation scheme would be 
to perform more computation on each node in the net- 
work. For example, the mapping from three-dimensional 
world coordinates to two-dimensional integer device 
coordinates could be performed on the graphics compu- 
tation network and then only the integer draw commands 
would have to be sent to the display board. 

SUMMARY 


draw. 1 i ne( ) 
select. window( ) 
draw. 1 i ne( ) 

and the graphics network completes the computations in 
a different order such as: 

sel ect . wi ndow( ) 
draw. 1 i ne( ) 
draw. 1 i ne( ) 

the desired result will not be achieved. 

The output master control node controls the syn- 
chronization of the graphics network. It does this 
by reading the commands stored in its FIFO queue and 
then sending a tag. dump command to the graphics net- 
work that gets routed to the appropriate primitive 
computation node. The command signals the primitive 
computation node to send its result back to the out- 
put master control node. Regardless of the completion 
order of the computations distributed on the network, 
the output node controls the data flow to the graph- 
ics display board. This maintains the correct FIFO 
ordering of the requested graphics display commands. 


An array of transputers was designed to speed 
graphics computations by off loading scan-conversion 
tasks and freeing the display processor for display 
only. Because of the current bandwidth of the trans- 
puter serial links, and the small computation time 
for the scan conversions tested, a performance degra- 
dation was observed on the network when compared to 
the display board by itself. Until higher bandwidth 
serial links are available on the transputers or more 
computation is performed on each node of the graphics 
network for every communication, this method of dis- 
tributing the graphics workload does not offer any 
performance gains. 
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TABLE 1. - COMPARISON OF LINE COMPUTE/DISPLAY TIMES 




Operations performed 

Time, 

MS 

Scan convert line (0,0) to (511, 511) 

7933 

Transmit computed data to display board 

12 512 

Transmit data and display line 

28 400 

Scan convert, transmit, and display 

36 399 

Graphics board draw line command 

14 887 

Graphics board fast draw line command 

3542 
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TABLE 2. - COMPARISON OF CIRCLE 
COMPUTE/DISPLAY TIMES 



FIGURE 1. - MULTIPLE PROCESSOR GRAPHICS DISPLAY ENGINE SHOWING 
PROCESSOR NUMBERS FOR ROUTING COMPUTATIONS. 
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FIGURE 2. - COMPUTE NODE BUFFER PROCESSES AND COMMUNICATION 
ROUTING. 
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FIGURE 4. - DATA FLOW FOR REQUESTING INFORMATION FROM NETWORK. 
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FIGURE 5. - DATA FLOW FOR SENDING COMPUTED DATA TO OUTPUT MASTEF 
NODE. 
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